Introduction
Image interpolation and super-resolution are topics of great interest. The aim is to improve the resolution of a lowresolution (LR) image/video to obtain a high-resolution (HR) one which is able to preserve the characteristics of natural images/videos. The major difference between interpolation and super-resolution is that interpolation only involves upsampling the low-resolution image, which is often assumed to be aliased due to direct down-sampling. The interpolation algorithms often exploit this aliasing property and perform dealiasing of the LR image during the upsampling process. As a result, the high-frequency components of the upsampled HR image can be better recovered for preserving the characteristics of natural images [1]. However, natural images do not usually observe severe aliases, such that the upsampled HR image does not usually recover sufficiently the high-frequency components which lead to an blurry image.
Besides the blurry effect, noises due to CCD's limitations and transmission errors, etc, have to be handled. Super-resolution aims to address all these undesirable effects, including the resolution degradation, blur and sometimes noise effects. Hence, super-resolution usually involves three major processes which are upsampling (interpolation), deblurring and denoising.
The applications of image interpolation and superresolution are very wide, including HDTV, image/video coding, image/video resizing, image manipulation, face recognition, view synthesis and surveillance. Due to many reasons, such as the cost of camera, insufficient bandwidth, storage limitation and limited computational power, the resolution of an image/video is always constrained. One intuitive example is the transmission of low-resolution content over the internet due to limited network bandwidth. Image interpolation and super-resolution algorithms play the role to produce a high quality and high resolution image/video from the observed low quality and low resolution image/video.
The rest of the organization of this paper is as follows. Section II describes the formulation and theory of major classes of the image interpolation algorithms. Section III classifies super-resolution algorithms and briefly explains one major class using the FIR Wiener filter. In both sections, some experimental results, including the execution times, are given. Furthermore, some future trends are included in each of the image interpolation or super-resolution section, and a conclusion is given at the end of the paper.
Image Interpolation
A. Polynomial-based interpolation
Image interpolation aims to produce a high-resolution image by upsampling the low-resolution image. As explained in the introduction section, interpolation algorithms often assume that the observed LR image is a direct downsampled version of the HR image. Hence, the de-aliasing ability during the upsampling process is important, i.e. the recovery of the high-frequency signal from the aliased low-frequency signal [1].
For real-time applications, conventional polynomial-based interpolation methods such as bilinear and bicubic interpolation are often used due to their computational simplicity [2]–[5]. For example, the bicubic convolution interpolator requires only several arithmetic operations per pixel, such that real-time processing can easily be achieved [4]. The basic idea is to model the image signal by a low-order polynomial function (using some observed signals). However, polynomial functions are not good at modeling the signal's discontinuities (e.g. edges). Hence, the conventional polynomial-based interpolation methods often produce annoying artifacts such as aliasing, blur, halo, etc. around the edges. To resolve this problem, some adaptive polynomial-based and step function-based interpolation methods were proposed [6]–[9].
B. Edge-directed interpolation
B1. Explicit interpolation
Since edges are visually attractive to the human perceptual system, some edge-directed interpolation methods have been developed to address the edge reconstructions [10]–[33]. In fact, the adaptive polynomial-based methods can be regarded as edge-directed methods as well. The basic idea of edge-directed methods is to preserve the edge sharpness during the upsampling process. The intuitive way is to explicitly estimate the edge orientation and then interpolate along the edge orientation [10]–[13]. For achieving low computation, some methods further quantize the edge orientations [12]–[13]. However, the interpolation quality of this intuitive way is constrained by the estimation accuracy of the edge orientation. Since, the edges of natural images are often blurred, blocky, aliased and noisy, the estimation accuracy of edge orientations is usually unstable. The interpolation quality of these methods can be improved by weighting the edge orientations, as described in the next section.
B2. Fusion of edge orientations
One major improvement to the explicit methods [10]–[13] is to adaptively fuse the results of several estimates of different edge orientations [14]–[18], [32]–[33]. The fusion process is usually computationally inexpensive. In [15]–[16], [32]–[33], two directionally interpolated results are fused using the linear minimum mean squares error estimation (LMMSE). The hidden markov random field (HMRF) was used to fuse two diagonal interpolated results and the bicubic interpolated results together [18]. Compared with the LMMSE methods [15]–[16], [32]–[33], the HMRF method also considers the consistency of fusion results by making use of the transition of states. Due to the advances in graphic processing unit (OPU), the OPU is often able to assist the directional image interpolation to achieve real-time upsampling [32]–[33].
B3. New edge-directed interpolation (NEDI)
New edge-directed interpolation (NEDI) uses the FIR Wiener filter, equivalently, the linear minimum mean squares error estimator [19] for linear prediction. Let us briefly describe the formulation of NEDI as an introduction. The 4-order linear estimation model is given by
Y_{j}=\sum_{k=1}^{{4}}A_{k}\cdot Y_{i\Delta k}+\varepsilon_{i},\quad for\quad i=1,2,\ldots,P\eqno\hbox{(2.1)}
\{\hat{A}_{k}\}=\mathop{\arg\min}_{\{\hat{A}_{k}\}}\sum_{i=1}^{P}\left[Y_{i}-\sum_{k=1}^{4}A_{k}\cdot Y_{i\Delta k}\right]^{2}
\eqno\hbox{(2.2)}
\hat{{\bf A}}=\mathop{\arg\min}_{\bf A}\Vert {\bf Y}-{\bf Y}_{{\bf A}}{\bf A}\Vert_{2}^{2}\eqno\hbox{(2.3)}
{\bf Y}=\{Y_{i}\}^{T},{\bf Y}_{{\bf A}}=\{Y_{i\Delta k}\}^{T},{\bf A}=\{A_{k}\}^{T}
\eqno\hbox{(2.4)}
The sizes of matrices \hat{{\bf A}}=({\bf Y}_{{\bf A}}^{T}{\bf Y}_{{\bf A}})^{-1}{\bf Y}_{{\bf A}}^{T}{\bf Y}
\eqno\hbox{(2.5)}
X=\sum_{k=1}^{4}\hat{A}_{k}\cdot X_{k}
\eqno\hbox{(2.6)}
For the missing data point between two available LR data points (e.g. the missing data point between
NEDI uses the LR image to estimate the HR covariances for the FIR Wiener filter, as shown in (2.5) and (2.6). There are many algorithms which are based on the idea of NEDI [21]–[30]. In [21], a six-order filter design was proposed to avoid error accumulation in the original two steps design. In [22], an eight-order filter was also proposed to include the vertical-horizontal correlation. The filter weights of the FIR Wiener filter were regularized for stability [23]. In [31], instead of using the LR image to estimate the HR covariances; the relevant HR image pairs were searched from an offline dictionary to estimate the HR covariances. Let us clarify this filtering process with the following over-simplified example.
Example:
A data sequence is given by \eqalignno{
& {\bf Y}=[\matrix{Y(2)& Y(3) & Y(4) & Y(5) & Y(6) & Y(7) & Y(8) & Y(9)]^{{\bf T}}}\cr
& {\bf Y}_{{\bf A}}=\left[\matrix{Y(0) & Y(1) & Y(2) & Y(3) & Y(4) &
Y(5) & Y(6) & Y(7)\cr Y(1) & Y(2) & Y(3) & Y(4) & Y(5) & Y(6) & Y(7) & Y(8)}\right]^{T}}
Y(10)=A_{1}Y(8)+A_{2}Y(9)=2Y(8)-Y(9).
B4. Soft-decision interpolation (SAI)
The soft-decision adaptive interpolation (SAI) [24] was proposed to interpolate a block of pixels at one time using the idea of NEDI. It has the benefit of using the block-based estimation, by constraining the statistical consistency within the block region, which comprises of both observed LR and unobserved HR pixels. Within a local window as shown in Figure 2, the observed LR pixels \eqalignno{
& \mathop{\arg\max}_{\rm x}\ p({\bf X}/{\bf Y})=\mathop{\arg\max}_{\rm x} \log [p({\bf Y}/{\bf X})p({\bf X})]\cr
& =\mathop{\arg\min}_{\rm x}\left(\Vert {\bf X}-{\bf Y}_{{\bf x}}{\bf A}\Vert_{2}^{2}+\Vert\tilde{{\bf Y}}-{\bf X}_{{\bf Y}}{\bf A}\Vert_{2}^{2}+\lambda\Vert\tilde{{\bf X}}-{\bf X}_{{\bf x}}{\bf B}\Vert_{2}^{2}\right) &\hbox{(2.7)}}
\mathop{\arg\min}_{\rm x} ({\bf CX}-\ {\bf DY})^{T} ({\bf CX}-{\bf DY})\eqno\hbox{(2.8)}
{\bf X}=({\bf C}^{T}{\bf C})^{-1}{\bf C}^{T} {\bf DY}\eqno\hbox{(2.9)}
B5. Robust soft-decision interpolation using weighted least squares (RSAI)
It is well known that the least squares estimation is not robust to outliers, hence the weighted least squares was proposed to improve the accuracy and robustness of the SAl [25]–[26]. The robust SAI (RSAI) incorporates the weights to all residuals in the cost function of the SAI, as follows
\mathop{\arg\min}_{\rm x}\left(\matrix{\Vert {\bf W}_{1}({\bf
X}-{\bf Y}_{{\rm x}}{\bf A})\Vert_{2}^{2}+\Vert {\bf
W}_{2}(\tilde{{\bf Y}}-{\bf X}_{{\bf Y}}{\bf A})\Vert_{2}^{2}\cr \hfill +\lambda\Vert {\bf W}_{3}(\tilde{{\bf X}}-{\bf X}_{{\bf x}}{\bf B})\Vert_{2}^{2}}\right)\eqno\hbox{(2.10)}
B6. Bilateral soft-decision interpolation for real-time applications (BSAI)
The SAI and RSAI are able to produce the best quality, but the computational cost is high. To largely reduce the computation, the least squares estimation should be avoided or reduced to single parameter estimation. Hence, the bilateral soft-decision interpolation (BSAI) was proposed to use the bilateral filter for replacing the least squares estimation for the model parameters and to reduce the weighted least squares soft-decision estimation to a single parameter estimation [17]. The BSAI uses bilateral filter weights \mathop{\arg\min}_{X}\left\{\left(X-\sum_{k=0}^{3}A_{k}X_{k}\right)^{2}+\sum_{k=0}^{3}U_{k}\left(X_{k}-\sum_{i=0}^{3}A,X_{ki}\right)^{2}\right\} \eqno\hbox{(2.11)}
X={\sum\limits_{k=0}^{3}A_{k}X_{k}+\sum\limits_{k=0;l=3-k}^{3}U_{k}A_{l}\left(X_{k}-\sum\limits_{{k=0;l\neq 3-k}}^{3}A_{i}X_{ki}\right)\over
1+U_{0}A_{3}^{2}+U_{1}A_{2}^{2}+U_{2}A_{1}^{2}+U_{3}A_{0}^{2}} \eqno\hbox{(2.12)}
B7. Recent trends af image interpolation applications
Recently, image interpolation does benefit from the development of sparse representation. A family of linear estimators corresponding to different priors are mixed using the sparse representation to give the final estimates for image interpolation [34]. Its performance is on a par with the SAI [24]. Obviously, the sparse representation, which has shown to be successful in compressive sensing, denoising, restoration and super-resolution, etc, is a fruitful research direction of image interpolation.
Another trend of the image interpolation application is the real-time upsampling of a LR video sequence for future very high-definition TVs, such as videos with 4K resolution. The HR video sequence can be obtained by fusing several lowresolution (LR) frames into one high-resolution (HR) frame. For such real-time applications, nonlocal means (weighted sum filter) can be directly applied due to its simplicity in computing the filter weights and its high performance. The nonlocal means can be used with a linear motion model to better estimate the filter weights, which is optimized for the upsampling [35]. After fast upsampling, some simple restoration techniques can be applied to deblur the video. In the future, there should be more of these fast algorithms which can perform upsampling in real-time, of which the realtime requirement is the major difficulty for their development.
Recently view synthesis attracts much attention in the image processing community. In 3D videos and multi-view synthesis, the major issue is to fill the newly exposed area (hole region). The hole filling in view synthesis resembles the video (multi-frame) interpolation scenario. Hence, the multiframe interpolation techniques such as nonlocal means can be adopted to fill the hole. Specifically, the nonlocal means can be modified to cope with irregular hole sizes and the extra depth information [36]. Similar to eqn. (2.6), the nonlocal means for hole filling can be defined as
x=\sum_{i\in N}w_{i}x_{i}\eqno\hbox{(2.13)}
c(x,x_{i})=\sum_{j\in W_{x};k\in W_{x(i)}}[x(j)-x_{i}(k)]^{2}[d_{x}(j)-d_{x(i)}(k)]^{2}\eqno\hbox{(2.14)}
w_{i}={\exp(-c(x,x_{i})/\sigma_{c})\cdot\exp(-d(x,x_{i})/\sigma_{d})\over
\sum\limits_{i\in N}\exp(-c(x,x_{i})/\sigma_{c})\cdot\exp(-d(x,x_{i})/\sigma_{d})}
\eqno\hbox{(2.15)}
C. PSNR and SSIM comparisons af several image interpolation algorithms
Let us compare the subjective and objective quality of some image interpolation algorithms as described in this paper. Table I and Table II show the PSNR and SSIM values of various image interpolation algorithms [4], [15], [17], [19], [24], [25] using 24 natural images from Kodak. We have also considered the execution time of these image interpolation algorithms using C++ codes as shown in Table III (see also figure 5). It is shown that the RSAI [25] achieves the highest average PSNR and SSIM but its execution time is longer than the second and the third best performers, i.e. SAI [24] and BSAI [17]. BSAI requires much less computation compared to that of the SAI but its quality is closed to the SAI For realtime applications, BSAI is the best choice; and for offline applications, RSAI is the best choice.
D. Classification ofimage interpolation algorithms
In this section, we are going to classify broadly image interpolation algorithms, as shown in figure 4. They are the polynomial-based approaches which are fast and can be adaptive to local statistics, and the edge-directed approaches which directly address the edge reconstruction criterion. Edge-directed methods can intuitively interpolate along one edge orientation, or fuse several estimates of different edge orientations, or minimize the linear mean squares error, for example, using the FIR Wiener filter, whereas the soft-decision interpolation can also be applied.
Super-resolution
A. Single-image and multi-frame super-resolution
Super-resolution (SR) aims to produce a high-resolution (HR) image using one or several observed low-resolution (LR) images by upsampling, deblurring and denoising. For multi-frame SR, there can involve registration and fusion processes. It is interesting to point out that some SR algorithms do not involve the denoising process, or some interpolation algorithms are also referred to as superresolution algorithms. Generally speaking, the superresolution methods can be classified into single-image superresolution (only one LR image is observed) [37]–[54] and multi-frame super-resolution [55]–[71]. The multi-frame superresolution for a video sequence can moreover use a recursive estimation approach of video frames, which is called the dynamic super-resolution [55]–[58]. Dynamic super-resolution makes use of the previously reconstructed HR frame to estimate the current HR frame.
B. Reconstruction-based and learning-based superresolution
There are two major categories of SR algorithms: they are the reconstruction-based and learning-based algorithms. Reconstruction-based algorithms [37]–[39], [55]–[65], [67]–[71] use the data fidelity function, usually with the prior knowledge to regularize the ill-posed solution. Gradients (edges) are the main features to be used as the prior knowledge. Some more general prior knowledge using nonlocal self-similarity was also proposed [62]. Learning-based algorithms [40]–[54], [66] moreover utilize the information from training images. Generally, when a LR image is observed, a patch within that LR image is extracted and searched within the trained dictionary for estimating a suitable HR patch that reconstructs the HR image. Some recent investigations [41]–[43] make use of the sparse representation for estimating the HR pair and training the dictionary. By combining the use of prior knowledge and dictionary training, some unified frameworks [42]–[44] for single-image SR were proposed to further regularize the estimated solution from the training sets.
C. Generic and face super-resolution
For the learning-based algorithms, they are most suitable for specific applications, such as face super-resolution. Since the face SR is crucial in applications like face recognition, surveillance, etc, a number of super-resolution methods were proposed for hallucinating the faces [48]–[54]. These are training-based methods which make use of the common characteristics of human faces (e.g. eigenface) to design the formulation for dictionary training and HR image reconstruction.
D. Blind and non-blind super-resolution
The super-resolution methods can also be classified as blind and non-blind methods. The blind methods treat the point spread function (PSF) that represents the blur, and the registration parameters as variables to be estimated simultaneously with the HR image [67]–[71]. The Bayesian or MAP framework is widely used to jointly estimate the variables. Non-blind methods are still widely used because the PSF can be approximated, can be set according to the user preference, or is known due to the knowledge of the camera. The registration parameters can also be separately estimated. Usually, the non-blind methods apply the regularizations to make them robust to inaccurate estimations of PSF, registration parameters, etc. The non-blind methods are widely used due to its simplicity in formulations, which are also easier to be parallelized for the practical applications.
E. FIR Wiener filter for super-resolution
The finite-impulse response (FIR) Wiener filter, or equivalently, the linear minimum mean squares error (LMMSE) estimator, can be applied to perform superresolution reconstruction. The block-based FIR Wiener filters were proposed for multi-frame SR [65]–[66]. In [65], a wide-sense stationary correlation function based on the geometric distance between pixels was proposed to estimate the covariances for the FIR Wiener filter, and elegant results were reported. The partition filters partitions an image into blocks for applying the FIR Wiener filter, where the filter weights are learned from an offline dictionary, which is retrieved during the online estimation [66].
Similarly, the Gaussian process regression (GPR) can be applied for super-resolution reconstruction. GPR provides a sophisticated Bayesian framework to estimate the HR image and the hyper-parameters of the process (e.g. noise variance, and variance of the correlation function) using a pilot HR image, which can be obtained by the bicubic interpolation [37]. The nonlocal means has been proposed as the correlation function in GPR [37]. Using the nonlocal means as the correlation function, the iterative scheme of FIR Wiener filter can alternatively update the estimated covariances and HR image, which can address the disadvantage of inaccurate pilot HR image [47].
Let us briefly introduce the iterative Wiener filter (IWF) algorithm [47] which currently produces the best PSNR and SSIM results among some state-of-the-art algorithms, as shown in Table IV and Table V. The formulation of the FIR Wiener filter which minimizes the linear mean squares error is given by
{\bf W}_{i}={\bf R}_{i}^{-1}{\bf p}_{i}
\eqno\hbox{(3.1)}
\hat{{\bf y}}_{i}={\bf W}_{i}^{T}{\bf x}_{i}
\eqno\hbox{(3.2)}
\eqalignno{
{\bf W}_{i} & =\left[c_{\max}E\{{\bf X}_{i}{\bf X}_{i}^{T}/c_{\max}\}\right]^{-1}c_{\max}E\{{\bf x}_{i}{\bf y}_{i}^{T}/c_{\max}\}\cr
& =\left[E\{{\bf x}_{i}{\bf x}_{i}^{T}/c_{\max}\}\right]^{-1}E\{{\bf x}_{i}{\bf y}_{i}^{T}/c_{\max}\}&\hbox{(3.3)}}
E\{p_{j}p_{k}/c_{\max}\}\approx\exp(-E(p_{j}-p_{k})^{2}/\sigma)
\eqno\hbox{(3.4)}
Example:
A data sequence is given by {\bf x}_{j}=[x(1)\quad x(3)]^{T}\ {\rm and}\ {\bf y}_{i}=x(2)
Let us initialize the unobserved vector as the average of the nearest two data points as follows
{\bf y}_{i}^{(0)}=(x(1)+x(3))/2
Let us compute the auto-correlation matrix and cross-correlation matrix using the correlation function in (3.4) with the nearest three data points for the expectation
First iteration:
\
\eqalignno{
{\bf R}_{i}^{(0)} & =\left[\matrix{
{\rm exp}(-E(x(1)-x(1))^{2}/\sigma) & {\rm exp}(-E(x(1)-x(3))^{2}/\sigma)\cr
{\rm exp}(-E(x(3)-x(1))^{2}/\sigma) & {\rm exp}(-E(x(3)-x(3))^{2}/\sigma)}\right]\cr
& =\left[\matrix{
{\rm exp}(-(0)/\sigma) & {\rm exp}(-E(x(1)-x(3))^{2}/\sigma)\cr
{\rm exp}(-E(x(1)-x(3))^{2}/\sigma) & {\rm exp}(-(0)/\sigma)}\right]}
\eqalignno{
& \exp(-E(x(1)-x(3))^{2}/\sigma)=\exp(-E(x(3)-x(1))^{2}/\sigma)\cr
& =\exp(-[(x(0)-{\bf y}_{j}^{(0)})^{2}+(x(1)-x(3))^{2}+({\bf y}_{i}^{(0)}-x(4))^{2}/\sigma)}
{\bf R}_{i}^{(0)}=\left[\matrix{1 & 0.5\cr 0.5 & 1}\right]
The cross correlation matrix is given by
\eqalignno{
{\bf P}_{i}^{(0)} & =\left[\matrix{
{\rm exp}(-E(x(1)-x(2))^{2}/\sigma)\cr
{\rm exp}(-E(x(3)-x(2))^{2}/\sigma)}\right]\cr
& =\left[\matrix{
{\rm exp}(-[(x(0)-x(1))^{2}+(x(1)-{\bf y}_{j}^{(0)})^{2}+({\bf y}_{j}^{(0)}-x(3))^{2}/\sigma)\cr
{\rm exp}(-[({\bf y}_{j}^{(0)}-x({\bf 1}))^{2}+(x(3)-{\bf y}_{i}^{(0)})^{2}+(x(4)-x(3))^{2}/\sigma)}\right]}
{\bf W}_{i}^{(0)}=\left({\bf R}_{i}^{(0)}\right)^{-1}{\bf P}_{i}^{(0)}=\left[\matrix{1 & 0.5\cr
0.5 & 1}\right]^{-1}
\left[\matrix{0.4\cr
0.6}\right]=\left[\matrix{
0.1333\cr
0.5333}\right]
{\bf y}_{i}^{(1)}={\bf W}_{i}^{T}{\bf x}_{i}=[0.1333 0.5333]\left[\matrix{
x(1)\cr
x(3)}\right]
For the next iterations, the estimated output vector is substituted in computing the auto-correlation matrix and cross correlation matrices until convergence or the termination criterion is met.
F. PSNR and SSIM comparison of several super-resolution algorithms
Table IV and Table V show the PSNR and SSIM values of 8 natural images
G. Summary of super-resolution algorithms
Let us also give a relational diagram for some major superresolution algorithms. Figure 8 gives a brief classification of the super-resolution algorithms. The super-resolution algorithms can be classified as single-image and multi-frame approaches. For single-image algorithms, they are mostly learning-based approaches which aim at reconstructing the generic images and face images. The reconstruction-based algorithms often make use of gradient or patch redundancy as the prior knowledge to regularize the solution. The multiframe algorithms can be divided into static and dynamic algorithms of which the recursive structure is applied. Moreover, the multi-frame algorithms can also be classified as non-blind and blind approaches, which simultaneously estimate the registration parameters and the PSF together with the HR image.
Conclusion
For image/video interpolation and super-resolution, more intensive research is expected due to the popularity of ultra high-definition TV and free viewpoint TV. Specifically, the multi-frame super-resolution and hole filling interpolation for view synthesis are going to be popular directions in 2D and 3D video applications. For a practical use of the super-resolution, fast algorithms are demanding and the blind algorithms will receive more attention in the future development. Furthermore, recent works show that face recognition can be benefited from a customized face superresolution which maximizes the differences between two face manifolds. However, the theoretical study of such justification requires more investigations. Among the techniques for interpolation and super-resolution, the sparse representations should be a promising direction, and significant results have already been available in image processing applications.
To complete this review, let us also include a short highlight of review works in the literature. An early review of super-resolution algorithms is given in [72] and a statistical performance analysis of super-resolution is shown in [76]. The limitation and challenges of super-resolution was reviewed in [73]–[75] which show that a major limitation of multi-frame super-resolution is the registration accuracy. However, this may be resolved by making use of the optical flow techniques [77], and the influence of the inaccurate registration can also be alleviated by using an appropriate regularization [78].
ACKNOWLEDGMENT
This work is supported by the Center for Signal Processing, the Hong Kong Polytechnic University (U-G863IPIO-024), RGC-PolyU 5278/08E, and SmartEN Marie Curie ITN-Network.
















